Search CORE

327 research outputs found

Shallow Text Clustering Does Not Mean Weak Topics: How Topic Identification Can Leverage Bigram Features

Author: Poncelet Pascal
Roche Mathieu
Velcin Julien
Publication venue: HAL CCSD
Publication date: 01/01/2016
Field of study

DMNLP co-located with the European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD)International audienceText clustering and topic learning are two closely related tasks. In this paper, we show that the topics can be learnt without the absolute need of an exact categorization. In particular, the experiments performed on two real case studies with a vocabulary based on bigram features lead to extracting readable topics that cover most of the documents. Precision at 10 is up to 74% for a dataset of scientific abstracts with 10,000 features, which is 4% less than when using unigrams only but provides more interpretable topics

Mesurer la proximité entre corpus par de nouveaux méta-descripteurs

Author: Bouillot Flavien
Poncelet Pascal
Roche Mathieu
Publication venue: Association ARIA
Publication date: 01/01/2015
Field of study

Devant le nombre d'algorithmes de classification existants, trouver l'algorithme qui sera le plus adapté pour classer un corpus de documents est une tâche difficile. La métaclassification apparaît aujourd'hui très utile pour aider à déterminer, en fonction des expériences passées, quel devrait être l'algorithme le plus pertinent par rapport à notre corpus. L'idée sous jacente est que "si un algorithme s'est montré particulièrement adapté pour un corpus, il devrait avoir le même comportement sur un corpus assez similaire". Dans cet article, nous proposons de nouveaux méta-descripteurs reposant sur les notions de similarités pour améliorer l'étape de méta-classification. Les expérimentations menées sur différents jeux de données réelles montrent la pertinence de nos nouveaux descripteurs. (Résumé d'auteur

HAL Descartes

Agritrop

HAL-CIRAD

SuMGra: Querying Multigraphs via Efficient Indexing

Author: Ienco Dino
Ingalalli Vijay
Poncelet Pascal
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 05/09/2016
Field of study

International audienceMany real world datasets can be represented by a network with a set of nodes interconnected with each other by multiple relations. Such a rich graph is called a multigraph. Unfortunately, all the existing algorithms for subgraph query matching are not able to adequately leverage multiple relationships that exist between the nodes. In this paper we propose an efficient indexing schema for querying single large multi-graphs, where the indexing schema aptly captures the neighbourhood structure in the data graph. Our proposal SuMGra couples this novel indexing schema with a subgraph search algorithm to quickly traverse though the solution space to enumerate all the matchings. Extensive experiments conducted on real benchmarks prove the time efficiency as well as the scalability of SuMGra

HAL Descartes

HAL-CIRAD

IFO2: A uniform approach for information system modelling

Author: Cicchetti Rosine
Poncelet Pascal
Teisseire Maguelonne
Publication venue: Universitat Politècnica de Catalunya. Departament de Llenguatges i Sistemes Informàtics
Publication date: 01/01/1994
Field of study

This paper is devoted to the IFO2 conceptual model, an extension of the semantic IFO model defined by S. Abiteboul and R. Hull. Its originalities are a uniform approach for both structural and behavioural application specifications, a "wholeobject" and "whole-event" approach, the use of constructors to express cornbinations of objects or events, the modularity and re-usability of specifications in order to optimize the designer's work. Furtherrnore, it offers an overview of the modelled system. To complement the modelling part, IFO2 includes a derivation component to perforrn the implementation of specifications by using an object-oriented or an active DBMS

UPCommons. Portal del coneixement obert de la UPC

GET_MOVE: An Efficient and Unifying Spatio-Temporal Pattern Mining Algorithm for Moving Objects

Author: Phan Nhat Hai
Poncelet Pascal
Teisseire Maguelonne
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 25/10/2012
Field of study

International audienceRecent improvements in positioning technology has led to a much wider availability of massive moving object data. A crucial task is to find the moving objects that travel together. Usually, they are called spatio-temporal pat- terns. Due to the emergence of many different kinds of spatio-temporal patterns in recent years, different approaches have been proposed to extract them. However, each approach only focuses on mining a specific kind of pattern. In addition to the fact that it is a painstaking task due to the large number of algorithms used to mine and manage patterns, it is also time consuming. Additionally, we have to execute these algorithms again whenever new data are added to the existing database. To address these issues, we first redefine spatio-temporal patterns in the itemset context. Secondly, we propose a unifying approach, named GeT Move, using a frequent closed itemset-based spatio-temporal pattern-mining algorithm to mine and manage different spatio-temporal patterns. GeT Move is implemented in two versions which are GeT Move and Incremental GeT Move. Experiments are per- formed on real and synthetic datasets and the experimental results show that our approaches are very effective and outperform existing algorithms in terms of efficiency

HAL-CIRAD

From Terminology Extraction to Terminology Validation: An Approach Adapted to Log Files

Author: Bonniol Stéphane
Poncelet Pascal
Roche Mathieu
Saneifar Hassan
Publication venue: Graz University of Technology, Institut für Informationssysteme und Computer Medien
Publication date: 01/01/2015
Field of study

International audienceLog files generated by computational systems contain relevant and essential information. In some application areas like the design of integrated circuits, log files generated by design tools contain information which can be used in management information systems to evaluate the final products. However, the complexity of such textual data raises some challenges concerning the extraction of information from log files. Log files are usually multi-source, multi-format, and have a heterogeneous and evolving structure. Moreover, they usually do not respect natural language grammar and structures even though they are written in English. Classical methods of information extraction such as terminology extraction methods are particularly irrelevant to this context. In this paper, we introduce our approach Exterlog to extract terminology from log files. We detail how it deals with the specific features of such textual data. The performance is emphasized by favoring the most relevant terms of the domain based on a scoring function which uses a Web and context based measure. The experiments show that Exterlog is a well-adapted approach for terminology extraction from log files

ZENODO

HAL Descartes

Agritrop

ARPHA OAI-PMH Endpoint

ARPHA Preprints

HAL-CIRAD

Querying RDF Data Using A Multigraph-based Approach

Author: Ienco Dino
Ingalalli Vijay
Poncelet Pascal
Villata Serena
Publication venue: OpenProceedings.org
Publication date: 15/03/2016
Field of study

International audienceRDF is a standard for the conceptual description of knowledge , and SPARQL is the query language conceived to query RDF data. The RDF data is cherished and exploited by various domains such as life sciences, Semantic Web, social network, etc. Further, its integration at Web-scale compels RDF management engines to deal with complex queries in terms of both size and structure. In this paper, we propose AMbER (Attributed Multigraph Based Engine for RDF querying), a novel RDF query engine specifically designed to optimize the computation of complex queries. AMbER leverages subgraph matching techniques and extends them to tackle the SPARQL query problem. First of all RDF data is represented as a multigraph, and then novel indexing structures are established to efficiently access the information from the multigraph. Finally a SPARQL query is represented as a multigraph, and the SPARQL querying problem is reduced to the subgraph homomorphism problem. AMbER exploits structural properties of the query multigraph as well as the proposed indexes, in order to tackle the problem of subgraph homomorphism. The performance of AMbER, in comparison with state-of-the-art systems, has been extensively evaluated over several RDF benchmarks. The advantages of employing AMbER for complex SPARQL queries have been experimentally validated

HAL-UNICE

INRIA a CCSD electronic archive server

HAL Descartes

HAL-CIRAD

Node Overlap Removal Algorithms: A Comparative Study

Author: Chen Fati
Piccinini Laurent
Poncelet Pascal
Sallaberry Arnaud
Publication venue: HAL CCSD
Publication date: 02/10/2019
Field of study

Appears in the Proceedings of the 27th International Symposium on Graph Drawing and Network Visualization (GD 2019)Many algorithms have been designed to remove node overlapping, and many quality criteria and associated metrics have been proposed to evaluate those algorithms. Unfortunately, a complete comparison of the algorithms based on some metrics that evaluate the quality has never been provided and it is thus difficult for a visualization designer to select the algorithm that best suits his needs. In this paper, we review 21 metrics available in the literature, classify them according to the quality criteria they try to capture, and select a representative one for each class. Based on the selected metrics, we compare 8 node overlap removal algorithms. Our experiment involves 854 synthetic and real-world graphs

RetweetPatterns: Detection of Spatio-Temporal Patterns of Retweets

Author: Cunha Tiago
Ienco Dino
Poncelet Pascal
Rodrigues Tomy
Soares Carlos
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 22/03/2016
Field of study

International audienceSocial media is strongly present in people's everyday life and Twitter is one example that stands out. The data within these types of services can be analyzed in order to discover useful knowledge. One interesting approach is to use data mining techniques to perceive hidden behaviours and patterns. The primary focus of this paper is the identification of patterns of retweets and to understand how information spreads over time in Twitter. The aim of this work lies in the adaptation of the GetMove tool, that is capable of extracting spatio-temporal pattern tra-jectories, and TweeProfiles, that identifies tweet profiles regarding several dimensions: spatial, temporal, social and content. We hope that the more flexible clustering strategy from TweeProfiles will enhance the results extracted by GetMove. We study the application of said mechanism to one case study and developed a visualization tool to interpret the results

HAL Descartes

HAL-CIRAD

Mining microarray data to predict the histological grade of a breast cancer

Author: Bringay Sandra
Fabrègue Mickaël
Orsetti Béatrice
Poncelet Pascal
Teisseire Maguelonne
Publication venue: 'Elsevier BV'
Publication date: 01/12/2011
Field of study

BACKGROUND: The aim of this study was to develop an original method to extract sets of relevant molecular biomarkers (gene sequences) that can be used for class prediction and can be included as prognostic and predictive tools. MATERIALS AND METHODS: The method is based on sequential patterns used as features for class prediction. We applied it to classify breast cancer tumors according to their histological grade. RESULTS: We obtained very good recall and precision for grades 1 and 3 tumors, but, like other authors, our results were less satisfactory for grade 2 tumors. CONCLUSIONS: We demonstrated the interest of sequential patterns for class prediction of microarrays and we now have the material to use them for prognostic and predictive applications

Elsevier - Publisher Connector

HAL-Inserm

HAL-CIRAD